Skip to content

feat: scanPerformance + analyzeStructuralPerf MCP tools, Cartographer FFI expansion, perf optimizations#205

Open
SimplyLiz wants to merge 37 commits intodevelopfrom
bench/compliance-scanner-baselines
Open

feat: scanPerformance + analyzeStructuralPerf MCP tools, Cartographer FFI expansion, perf optimizations#205
SimplyLiz wants to merge 37 commits intodevelopfrom
bench/compliance-scanner-baselines

Conversation

@SimplyLiz
Copy link
Copy Markdown
Owner

@SimplyLiz SimplyLiz commented Apr 10, 2026

Summary

  • internal/perf package — new hidden-coupling scanner (Scan) and structural perf analyzer (AnalyzeStructural): detect files that co-change without static import edges, and find call expressions inside loop bodies in high-churn files
  • scanPerformance + analyzeStructuralPerf MCP tools — expose both scan modes via MCP and CLI (ckb perf coupling / ckb perf structural)
  • Cartographer FFI — expanded SearchContent options (before/after context, invert, word-regexp, files-with-matches, count-only, no-ignore), FindFiles with FindOptions (modified-since, size bounds, max-depth), ReplaceContent and ExtractContent bindings
  • Large-repo SCIP gateckb index skips SCIP above 50k source files and guides the user to the FTS + LSP + LIP tier; ckb doctor reports which tier is active and whether the LIP daemon is running
  • Three targeted large-repo perf wins (see benchmarks below)

Perf optimizations (with before/after numbers)

All measured on Apple M4 Pro, arm64, -count=3 -benchmem unless noted.

1. Lift seen map out of recordCommit

One make(map[string]bool) per commit → allocate once, range-delete to clear.

Benchmark allocs before allocs after Δ
CoChangePipeline/500c_10f 1526 29 −98%
CoChangePipeline/1kc_20f 3072 75 −97.6%
CoChangePipeline/1kc_20f (B/op) 2,522,031 1,586,994 −37%
CoChangePipeline/1kc_20f (ns/op) ~5.4 ms ~4.8 ms −11%

2. buildExplanation: fmt.Sprintfstrings.Builder + strconv

Variant ns/op before ns/op after allocs before allocs after
non-entrypoint 352 208 6 3
entrypoint 350 188 7 3
CallSitePipeline/500sites 160,000 75,000 3100 700

3. Stream git output via bufio.Scanner

Replaced cmd.Output() + bulk string split with StdoutPipe + bufio.Scanner — one 64 KB ring buffer, zero bulk copy.

4. SQLite bulk PRAGMA tuning

synchronous=OFF, cache_size=-131072 (128 MB), wal_autocheckpoint=0 during PopulateFromFullIndex, with a single PRAGMA wal_checkpoint(TRUNCATE) on completion. Eliminates WAL checkpoint interruptions across the 50 batch transactions on large repos.

5. FTS batched INSERT

Replaced ~2M individual stmt.Exec calls in BulkInsert with 499-row multi-row INSERTs inside one transaction. Measured at 50k-doc scale (2M symbols):

current streaming
large_50k_docs 252 s 165 s

6. CallerIndex background pre-warm

buildCallerIndex blocks the first getCallGraph / traceUsage call. Now starts in a background goroutine immediately after LoadIndex returns; callerIndexOnce prevents duplicate work if the call races the goroutine.

Repo size cold-start latency absorbed
1k docs 4.6 ms
10k docs 83 ms
50k docs 6.7 s, 489 MB

7. BulkInsertFunc streaming API for FTS

PopulateFTSFromSCIP previously built a full []SymbolFTSRecord slice (~400 MB for 50k-file repo) before the transaction started. BulkInsertFunc takes a fn(flush) callback so the caller can stream in 10k-record chunks, never materialising the full slice.

Symbols BulkInsert (old) alloc BulkInsertFunc (new) alloc
100k 97 MB 87 MB
500k 493 MB 439 MB

8. symbolsForFiles batch query in SemanticSearchWithLIP

Per-URI WHERE file_path = ? queries replaced with a single WHERE file_path IN (…) call via the new symbolsForFiles method. SemanticSearchWithLIP signature updated to take a batch callback.

Files resolved N queries (old) Batch IN (new) Speedup
5 118 µs 84 µs 1.4×
10 281 µs 168 µs 1.7×
20 756 µs 301 µs 2.5×

Test results

ok  github.com/SimplyLiz/CodeMCP/internal/perf          1.9s   (37 tests)
ok  github.com/SimplyLiz/CodeMCP/internal/storage        2.6s
ok  github.com/SimplyLiz/CodeMCP/internal/query          6.9s
ok  github.com/SimplyLiz/CodeMCP/internal/backends/scip  2.0s
ok  github.com/SimplyLiz/CodeMCP/cmd/ckb                 4.2s
ok  github.com/SimplyLiz/CodeMCP/internal/mcp            9.6s

Race detector clean on ./internal/backends/scip/... (covers new background goroutine).

Test plan

  • go test ./... — all packages green
  • go test -race ./internal/backends/scip/... — no data races
  • go test -bench=BenchmarkBulkInsertVsFunc -benchmem ./internal/storage/...
  • go test -bench=BenchmarkSymbolsForFileVsBatch -benchmem ./internal/storage/...
  • go test -bench=BenchmarkBuildCallerIndex -benchmem ./internal/backends/scip/...

🤖 Generated with Claude Code

SimplyLiz and others added 16 commits April 9, 2026 11:13
…ndings)

- internal/cartographer/bridge.go — full CGo bindings (11 FFI functions);
  RankedSkeleton() and UnreferencedSymbols() added for v1.6.0 support
- internal/cartographer/bridge_stub.go — stubs for all 13 functions incl.
  RankedSkeleton and UnreferencedSymbols; builds cleanly without Rust toolchain
- internal/cartographer/types.go — complete Go type set incl. new
  RankedSkeletonResult/File and UnreferencedSymbolsResult/File
- internal/query/status.go — Cartographer added to getBackendStatuses();
  shows availability, version, and capabilities in `ckb status`
- internal/query/review_layers.go — new checkLayerViolations() check;
  runs Cartographer layer analysis on PR-changed files; skips gracefully
  when not compiled in; tier-2 findings (architecture/warning)
- internal/query/review.go — wire layers check into the parallel check
  loop; add "layers" to tier-2 in findingTier
- Makefile — build/build-cartographer/build-fast/test/lint/clean targets;
  documents the -tags cartographer build flag

All binaries build clean with and without -tags cartographer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wire five Cartographer FFI calls into the review pipeline so CKB uses
the Rust library in-process, not via MCP:

- review_coupling.go — HiddenCoupling() augments coupling check with
  co-change pairs that have no import edge; deduped against SCIP gaps;
  rule: ckb/coupling/hidden
- review_deadcode.go — UnreferencedSymbols() adds phase 3 to dead-code
  check; catches public exports with no callers project-wide; rule:
  ckb/dead-code/unreferenced-export
- review_arch_health.go (new) — checkArchitecturalHealth() reports
  cycles (≥3 → error), god modules, and layer violations from
  cartographer.Health(); tier-2; rules: ckb/arch-health/*
- review_blastradius.go — GitChurn() loaded at check start; files with
  ≥15 commits escalate blast-radius findings from info → warning
- review.go — arch-health goroutine wired into parallel check loop;
  arch-health added to tier-2 in findingTier()

All calls guarded by cartographer.Available(); both build paths clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses the remaining memory and cold-start bottlenecks for large repos.

mmap (mmap_unix.go / mmap_other.go):
- On Unix, memory-map the .scip file instead of os.ReadFile. The OS
  manages paging; raw bytes never hit the Go heap. Falls back to
  ReadFile on non-Unix platforms.

Streaming protobuf (loader.go):
- Replace proto.Unmarshal into a full scippb.Index (which materialises
  all documents simultaneously) with a protowire stream parse. A producer
  goroutine emits one *scippb.Document at a time to a buffered channel;
  nWorkers consumers convert+index each doc and release it.
- Peak memory drops from ~3× .scip file size (raw bytes + scippb.Index
  + CKB structs) to ~1× (CKB structs only; raw bytes are mmap pages
  managed by the OS).
- Fixed: SCIP Index.Documents is field 2, not 3 (ExternalSymbols is 3).

DefinitionIndex (loader.go, symbols.go):
- Add DefinitionIndex map[string]*OccurrenceRef — first definition
  occurrence per symbol, built for free during the parallel doc phase.
- findSymbolLocationFast now hits DefinitionIndex in O(1) instead of
  scanning all RefIndex entries for the symbol (was O(k) per symbol,
  expensive for high-cardinality symbols during ConvertedSymbols build).

NameIndex + SearchSymbols (loader.go, symbols.go):
- Add NameIndex []NameEntry (sorted by name) built after ConvertedSymbols.
- SearchSymbols iterates the compact sorted slice instead of the
  ConvertedSymbols map. Cache-line–friendly access pattern vs scattered
  map bucket pointers; also enables early-exit prefix search.

Gob cache (cache.go, adapter.go):
- After the first full build, saveDerivedCache writes ConvertedSymbols,
  ContainerIndex, and NameIndex to .ckb/scip_derived.gob (async).
- On subsequent startups, loadDerivedCache validates mtime+size against
  the .scip file and, if fresh, restores all three via applyCachedDerived
  — skipping the entire parallel symbol-conversion phase entirely.
- Cache file is written atomically (tmp + rename).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wire Cartographer into existing query endpoints — all guarded by
cartographer.Available(), zero impact on non-Cartographer builds:

- getArchitecture — MapProject adds arch health score, cycles,
  god modules, and bridge nodes to the response
- analyzeImpact — SimulateChange adds predicted affected modules,
  cycle risk, layer violations, and health delta (ArchImpact field)
- getModuleOverview — GetModuleContext adds skeleton (signatures +
  imports + deps) for the requested module
- summarizeDiff — Semidiff adds function-level added/removed signatures
  per file for commit-range selectors (FunctionChanges field)
- getHotspots — GitCochange adds co-change partners (top 3) to each
  hotspot file (CochangePartners field)
- exportForLLM — SkeletonMap / RankedSkeleton injected into response;
  tokenBudget param triggers personalized PageRank skeleton
- review_layers — auto-detects .cartographer/layers.toml instead of
  always passing empty string
- status — capabilities list expanded to enumerate all 11 active
  Cartographer integrations
- skeleton.go (new) — GetSkeleton/GetRankedSkeleton engine helpers
  used by exportForLLM and future callers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ce tests

scanPerformance (internal/perf + MCP tool):
- Whole-repo hidden coupling scan in a single git log pass; filters
  out pairs with a static import edge via path-fragment heuristic
- Exposed as MCP tool scanPerformance with minCorrelation, minCoChanges,
  windowDays, limit, and scope params
- cmd/ckb/perf — CLI command wrapping the same analyzer with table and
  JSON output formats

internal/cicheck (CI workflow compliance tests):
- TestWorkflowActionsPinned — all uses: must be SHA-pinned
- TestWorkflowActionsVersionComments — SHA pins must have a version comment
- TestWorkflowJobsHaveTimeout — every job must declare timeout-minutes
- TestWorkflowNoDirectInputInterpolation — no ${{ inputs.* }} in run: blocks
- TestWorkflowNoLatestDockerTag — no docker://...:latest
- TestWorkflowConsistentActionVersions — same action must use same SHA everywhere

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- scipadapter_test.go: populate DocumentsByPath alongside Documents in
  test fixture; GetDocument now uses the map, not the slice
- presets_test.go, token_budget_test.go: update expected tool counts to
  99 (v8.5 adds analyzeStructuralPerf)
- perf/types.go: add StructuralPerfOptions and LoopCallSite types for
  the upcoming structural performance analysis feature
- typescript fixtures: sync expected search output

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ixes

analyzeStructuralPerf (v8.5):
- New MCP tool: detects loop call sites in high-churn files using
  tree-sitter. Complements scanPerformance (cross-file hidden coupling)
  with intra-file O(n)/O(n²) structural signals.
- tool_impls_perf.go: implementation wired to internal/perf
- tools.go: tool definition with windowDays, minChurnCount, limit, scope,
  entrypointFiles params
- presets.go: added to full preset (total 99 tools)

navigation.go (explore):
- Use Cartographer.MapProject when available for directory overview;
  gains ignore-aware file list and per-language file counts (Languages
  field added to ExploreResult)
- Falls back to OS walk when Cartographer is unavailable

compound.go: minor query engine updates

scanner_bench_test.go, tool_impls_batch2_test.go: test fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lit, GetOwnership

PrepareChange (compound.go):
- getPrepareCoChanges now uses cartographer.GitCochange() — single git
  pass (bot-filtered) instead of O(n) per-file subprocess spawns
- Cross-references with HiddenCoupling() to mark each pair IsHidden when
  there is no import edge; callers can surface implicit risk to the LLM
- Falls back to coupling.Analyzer when Cartographer is not compiled in
- Added IsHidden bool to PrepareCoChange struct

PlanRefactor (compound_refactor.go):
- Collects hidden-coupling files from PrepareChange CoChangeFiles where
  IsHidden=true and surfaces them as HiddenCouplingFiles in CouplingAnalysis
- Fallback: calls HiddenCoupling() directly when PrepareChange ran without
  Cartographer but it became available by assembly time
- Added HiddenCouplingFiles []string to PlanRefactorCoupling struct

suggestPRSplit (review_split.go):
- addCartographerEdges: builds adjacency from static import graph
  (MapProject) + temporal coupling (GitCochange ≥ 0.5) in two single-pass
  calls; no per-file subprocess limit
- Replaced 200-file skipCoupling heuristic with Cartographer path;
  fallback to addCouplingEdges for non-Cartographer builds (200-file cap kept)

GetOwnership (ownership.go):
- CoChangePartners field added to response — top 5 files that co-change
  with the queried path (noise-filtered); implicit co-owners invisible
  to CODEOWNERS but visible in git history

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…2.1)

SCIP determinism:
- loader.go: sort parallel-load results by document path before merging
  so RefIndex/DefinitionIndex construction is goroutine-schedule-independent
- loader.go: fix NameIndex sort to total order (Name, ID) — map iteration
  produced non-deterministic output when two symbols share a name
- adapter.go: SetCacheRoot() lets tests redirect the derived-index cache
  so concurrent tests don't race on the shared fixture scip_derived.gob

FTS determinism:
- fts.go: sort symbol IDs before inserting into FTS5 — map iteration
  produced random BM25 scores and flaky golden test rankings
- engine.go: extract StartBgTasks() from NewEngine(); production entry
  points call it explicitly; tests skip it and call PopulateFTSFromSCIP
  synchronously to get deterministic state
- engine_helper.go + server.go: call StartBgTasks() after engine init

Test isolation:
- golden_test.go: call DisableBgFTS() + SetCacheRoot(tmpDir) to prevent
  background goroutines racing with synchronous FTS population
- Update golden fixtures to reflect deterministic symbol ordering

Bump to v8.2.1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
compliance/scanner.go:
- Reuse identifier buffer and seen-set across lines to eliminate
  per-line map/slice allocations in scanFile and CheckPIIInLogs
- Replace unicode import with sync (buffer reset pattern)
- Add compliance testdata fixtures

internal/perf/structural.go (cgo build):
- AnalyzeStructural: detects calls-inside-loops using tree-sitter;
  identifies O(n)/O(n²) structural anti-patterns in high-churn files
  via three-stage pipeline: git churn → tree-sitter parse → annotation

internal/perf/structural_stub.go (!cgo build):
- Stub for non-cgo builds; returns ErrUnavailable

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Restructures the perf CLI from a single flat command into a parent with
two explicit subcommands so each analysis mode has its own flags and help.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… tests

- Add SearchContent and FindFiles FFI calls with stub implementations
- Add structural perf tests split across CGO and non-CGO build tags
- Add perf_bench_test.go: 22 benchmarks covering recordCommit O(n²),
  co-change pipeline simulation, importCouldReferTo, shouldIgnore
- Add structural_bench_test.go (CGO): 11 benchmarks for call-site
  annotation pipeline with documented baselines
- Expand FindFiles to accept FindOptions (filter by mtime, size, depth)
- Expand SearchContentOptions with full ripgrep-parity fields
- Add FileCount, FilesWithMatches/WithoutMatch fields to SearchResult
- ReplaceOptions/ReplaceResult/FileChange/DiffLine types for sed-like replace
- ExtractOptions/ExtractResult/ExtractMatch/CountEntry types for awk-like extract
- ReplaceContent() and ExtractContent() in real bridge + stubs
- Matches cartographer v1.8.0 FFI surface
…epoRoot field

- recordCommit: allocate seen map once in buildCoChangePairs, clear with
  range-delete instead of make() per commit → −97% allocs at 1k commits
- buildExplanation: strings.Builder + strconv replaces fmt.Sprintf → −40%
  latency, allocs halved (6→3 per site)
- Remove ScanOptions.RepoRoot (dead field — Analyzer.repoRoot used throughout)
- Update bench baselines and document results in docs/performance_log.md
Replace cmd.Output()+strings.Split with StdoutPipe+bufio.Scanner so the
full git log is never loaded into memory as a bulk string. On a repo with
10k commits the old path double-copied ~500 KB ([]byte→string→[]string);
now each line is processed in-place from a 64 KB ring buffer.

Also switch from TrimSpace to bytes.TrimRight("\r") — precise, no leading
space scan — and fix BenchmarkCoChangePipelineSimulated's pairs map hint
from commits×files (20k) to files*(files-1)/2 (190), the actual unique
pair ceiling.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 10, 2026

🟢 Change Impact Analysis

Metric Value
Risk Level LOW 🟢
Files Changed 152
Symbols Changed 132
Directly Affected 0
Transitively Affected 0

Blast Radius: 0 modules, 0 files, 0 unique callers

📝 Changed Symbols (132)
Symbol File Type Confidence
.gitignore .gitignore modified 30%
CARTAGORAPHER_INTEGRATION_SUMMARY.txt CARTAGORAPHER_INTEGRATION_SUMMARY.txt added 30%
CARTOGRAPHER_INTEGRATION.md CARTOGRAPHER_INTEGRATION.md added 30%
CARTOGRAPHER_INTEGRATION_SUMMARY.md CARTOGRAPHER_INTEGRATION_SUMMARY.md added 30%
CARTOGRAPHER_PROJECT_STATUS.md CARTOGRAPHER_PROJECT_STATUS.md added 30%
CARTOGRAPHER_RELEASE_PLAN.md CARTOGRAPHER_RELEASE_PLAN.md added 30%
CARTOGRAPHER_STRATEGY.md CARTOGRAPHER_STRATEGY.md added 30%
CHANGELOG.md CHANGELOG.md modified 30%
Makefile Makefile added 30%
bench/baselines/v8.2.1.txt bench/baselines/v8.2.1.txt added 30%
bench/baselines/v8.4.0.txt bench/baselines/v8.4.0.txt added 30%
bench/baselines/v8.5.0.txt bench/baselines/v8.5.0.txt added 30%
cmd/ckb-bench/main.go cmd/ckb-bench/main.go added 30%
cmd/ckb/engine_helper.go cmd/ckb/engine_helper.go modified 30%
cmd/ckb/impact.go cmd/ckb/impact.go modified 30%
+117 more

Recommendations

  • ℹ️ coverage: 132 symbols have low mapping confidence. Index may be stale.
    • Action: Run 'ckb index' to refresh the SCIP index

Generated by CKB

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 10, 2026

CKB Analysis

Risk Files +35197 -755 Modules

🎯 132 changed → 0 affected · 🔥 30 hotspots · 📊 8 complex · 💣 20 blast · 📚 175 stale

Risk factors: Large PR with 158 files • High churn: 35952 lines changed • Touches 30 hotspot(s)

👥 Suggested: @lisa.welsch1985@gmail.com (6%), @talantyyr@gmail.com (1%), @lisa@tastehub.io (1%)

Metric Value
Impact Analysis 132 symbols → 0 affected 🟢
Doc Coverage 7.428571428571429% ⚠️
Complexity 8 violations ⚠️
Coupling 0 gaps
Blast Radius 0 modules, 0 files
Index indexed (0s) 💾
🎯 Change Impact Analysis · 🟢 LOW · 132 changed → 0 affected
Metric Value
Symbols Changed 132
Directly Affected 0
Transitively Affected 0
Modules in Blast Radius 0
Files in Blast Radius 0

Symbols changed in this PR:

Recommendations:

  • ℹ️ 132 symbols have low mapping confidence. Index may be stale.
    • Action: Run 'ckb index' to refresh the SCIP index
💣 Blast radius · 0 symbols · 20 tests · 0 consumers

Tests that may break:

  • cmd/ckb-bench/version_test.go
  • internal/backends/scip/scale_bench_test.go
  • internal/cicheck/cicheck_test.go
  • internal/compliance/scanner_bench_test.go
  • internal/compliance/scanner_test.go
  • … and 15 more
🔥 Hotspots · 30 volatile files
File Churn Score
CHANGELOG.md 11.33
cmd/ckb/index.go 7.33
cmd/ckb/perf.go 9.24
go.mod 8.33
internal/backends/scip/callgraph.go 7.60
internal/backends/scip/loader.go 12.87
internal/cartographer/bridge.go 13.04
internal/cartographer/bridge_stub.go 7.76
📦 Modules · 6 at risk
Module Files
🔴 third_party/cartographer 52
🔴 internal/query 22
🔴 internal/mcp 14
🟡 internal/backends 9
🟡 internal/perf 9
🟡 internal/incremental 6
📊 Complexity · 8 violations
File Cyclomatic Cognitive
cmd/ckb-bench/main.go ⚠️ 31 ⚠️ 90
cmd/ckb/impact.go ⚠️ 24 ⚠️ 48
cmd/ckb/index.go ⚠️ 54 ⚠️ 97
internal/backends/scip/callgraph.go ⚠️ 35 ⚠️ 78
internal/backends/scip/loader.go ⚠️ 71 ⚠️ 240
internal/backends/scip/streaming.go 11 ⚠️ 25
internal/backends/scip/symbols.go ⚠️ 22 ⚠️ 54
internal/cicheck/cicheck_test.go ⚠️ 24 ⚠️ 87
💡 Quick wins · 10 suggestions
📚 Stale docs · 175 broken references

Generated by CKB · Run details

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 39.31679% with 1883 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/lip/client.go 9.7% 164 Missing and 2 partials ⚠️
internal/incremental/updater.go 42.1% 135 Missing and 19 partials ⚠️
cmd/ckb/perf.go 15.1% 95 Missing ⚠️
internal/incremental/extractor.go 0.0% 95 Missing ⚠️
internal/mcp/tool_impls_perf.go 0.0% 85 Missing ⚠️
internal/query/lip_ranker.go 5.1% 73 Missing ⚠️
cmd/ckb/impact.go 4.0% 71 Missing ⚠️
internal/mcp/tool_impls_v86.go 0.0% 71 Missing ⚠️
internal/storage/lip_annotations.go 0.0% 69 Missing ⚠️
internal/mcp/tool_annotations.go 0.0% 63 Missing ⚠️
... and 41 more
Additional details and impacted files
@@            Coverage Diff            @@
##           develop    #205     +/-   ##
=========================================
- Coverage     43.0%   42.8%   -0.2%     
=========================================
  Files          507     525     +18     
  Lines        78022   80741   +2719     
=========================================
+ Hits         33614   34632   +1018     
- Misses       42045   43675   +1630     
- Partials      2363    2434     +71     
Flag Coverage Δ
unit 42.8% <39.3%> (-0.2%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown

CKB review failed to generate output.

… below MinCoChanges early

- ScanFiles now allocates seen+identBuf once per scan instead of once per
  file; eliminates N make() calls for N-file scans
- ScanOptions.MaxCommitFiles: skip commits above the file-count threshold
  (mass renames / formatting sweeps); 0 = unlimited
- buildCoChangePairs prunes pairCounts entries with count < MinCoChanges
  after parsing; lossless — Scan() would filter them anyway, but doing it
  early cuts the O(N²/2) correlation iteration for large repos
- structural.go call-site updated to pass new params (0, 1 = no pruning)
Adds synthetic scale benchmarks for the 50k-file indexing bottleneck
(1h SCIP + 10h+ ckb index timeout on customer repo):

- backends/scip/scale_bench_test.go: LoadSCIPIndex at 1k/10k/50k docs
  baseline: ~8s/op, 6.9 GB alloc at 50k docs
- incremental/scale_bench_test.go: ApplyDelta, PopulateFullIndex,
  UpdateFileDepsHotPath, GetDependenciesPerFile at scale
  baseline: ~50s/op at 50k files × 40 syms × 200 refs
- bench/baselines/v8.2.1.txt: benchstat-compatible before-fix reference
…r, scip alloc fix

- cartographer: expose BM25Search and QueryContext FFI bindings with full
  options/result types; stub no-ops added to bridge_stub.go
- incremental: split UpdateFileDeps into one-off and bulk paths
  (updateFileDepsWithStmt) to share prepared statements across batch inserts
- scip/loader: pre-allocate backing slice for OccurrenceRefs — cuts allocs
  from O(total_occs) to O(docs) at load time
- .gitignore: exclude /ckb-bench binary, registry token files, marketing zips
- docs: add cartographer integration docs, cognitive vault spec v1.1,
  roadmap v8.1, ckb-bench command skeleton, cartographer bench test
PopulateFromFullIndex rewrite for the 50k-file / 10h timeout case:

- Phase 2 extractFileDelta now runs in parallel (GOMAXPROCS workers) —
  CPU-bound SCIP document parsing was fully sequential before
- Single giant transaction split into 1000-file batches, keeping WAL
  bounded and allowing incremental checkpointing
- PRAGMA synchronous=OFF for the bulk load duration (safe: failed full
  index is always re-run from scratch)
- bulkInsertFileSymbols: batched multi-row VALUES (499 rows/stmt) instead
  of one Exec per symbol
- applyStmts struct: file_symbols, callgraph, and file_deps statements
  all prepared once per transaction — eliminates 3× 50k Prepare/Close
  round-trips that were the dominant cost on large repos
- insertCallEdgesWithStmt: callgraph inserts use the shared stmt too
…act format

- scip: build CallerIndex at load time (phase 4); FindCallers is now O(1)
  via map lookup instead of O(docs×syms×occs) scan; buildCallerIndex uses
  sorted interval scan with early-break and per-doc edge dedup
- envelope: add Backend/Accuracy fields to Meta; AccuracyForBackend helper
  maps scip→high, lsp→medium, tree-sitter/fallback→low
- query/engine: ActiveBackendName() returns active backend tier
- mcp: WithBackend on ToolResponse; analyzeImpact and prepareChange emit
  backend+accuracy in envelope; prepareChange supports format=compact for
  token-budget-constrained callers
- mcp/tool_impls: best-effort LIP nyx-agent-lock check on affected files
- cmd: `ckb impact prepare` subcommand with --format=compact
- bench: v8.4.0 baseline (scip alloc -15%, ApplyDelta/large -20%)
…lation

- Add queryContext tool: Cartographer PKG retrieval pipeline (BM25 search →
  personalized PageRank skeleton → context health) in a single MCP call.
  Returns ready-to-inject context bundle with token count and A–F grade.
- Add contextHealth tool: scores a context bundle on 6 research-backed metrics
  (signal density, compression density, position health, entity density,
  utilisation headroom, dedup ratio) with composite 0–100 score and
  actionable recommendations.
- prepareChange: add parallel Cartographer SimulateChange goroutine; result
  surfaces as archImpact on PrepareChangeResponse. Cycle risk and layer
  violations feed into calculatePrepareRisk as explicit risk factors.
- Bump full preset tool count to 101 (was 99).
git-subtree-dir: vendor/cartographer
git-subtree-split: 7e8fd8e8d9d29a0453d29dff436e2a65b61bbda9
… v8.5.0

- Move Cartographer from fragile sibling-repo CGo path to
  third_party/cartographer/ via git subtree (vendor/ conflicted with Go
  toolchain vendor consistency checks)
- Fix 6 CGo path directives in bridge.go: ../../../../Cartographer →
  ../../third_party/cartographer
- Wire three previously unexposed C exports as Go FFI + stubs:
  ShotgunSurgery, Evolution, BlastRadius
- Expose as MCP tools: detectShotgunSurgery, getArchitecturalEvolution,
  getBlastRadius (registered in tool_impls_v86.go)
- buildCallerIndex: pool ivs slice and replace per-doc docSeen map with
  generation counter — saves ~6k allocs on small SCIP load
- Bump version to 8.5.0
@github-actions
Copy link
Copy Markdown

CKB review failed to generate output.

- CallerIndex is now built on the first FindCallers call (sync.Once)
  instead of at LoadIndex time — removes ~22k persistent heap objects
  from small SCIP loads, restoring alloc count to v8.4.0 baseline
- proto.UnmarshalOptions{DiscardUnknown: true} on both document stream
  unmarshal calls; skips reflection-based unknown-field accumulator
- buildCallerIndex: reuse ivs slice + generation-counter deduplication
  (already landed; documented in CHANGELOG)
- lip.GetEmbedding: request TurboQuant-quantized embeddings from LIP
  daemon; same silent-degradation pattern as GetAnnotation
- CHANGELOG: full v8.5.0 section covering all perf improvements
  including v8.4.0 incremental wins not previously documented
@github-actions
Copy link
Copy Markdown

CKB review failed to generate output.

SimplyLiz and others added 8 commits April 12, 2026 00:38
PopulateFromFullIndexStreaming: two-pass proto-native path that never
materialises the full SCIPIndex in RAM. Uses extractFileDeltaFromProto
to skip all intermediate scip.Document allocations. Adaptive threshold
in PopulateAfterFullIndex selects streaming for indexes > 200 MB,
old single-pass path for smaller ones. Large-repo cold-run: 485s → 83s.

LIP client: embedding_batch (single round-trip for RerankWithLIP),
nearest / nearest_by_text (HNSW-backed semantic file search),
symbol_embedding, index_status, file_status. Generic lipRPC transport
consolidates all simple request→response calls.

Fast-tier search: RerankWithLIP now uses GetEmbeddingsBatch instead of
N serial GetEmbedding calls. SemanticSearchWithLIP supplements sparse
FTS results (<3) via nearest_by_text → symbolsForFile resolution.
lipRanked flag prevents a redundant second embedding_batch pass.

LIP symbol annotations: annotationSet/Get/List MCP tools with local
SQLite backing. Watcher gains Seq + DeltaAck for LIP delta protocol.
Add scipLargeRepoThreshold (50k source files). When exceeded, ckb index
prints what's available without SCIP (FTS + LSP + LIP semantic search),
what requires SCIP (call graph, analyzeImpact), and the exact indexer
command to generate SCIP manually. --scip flag overrides the gate;
--force also proceeds with a duration warning.

Also update MCP full-preset tool count bounds to 107 (+3 Cartographer,
+3 LIP annotation tools added in previous commit).
PopulateFromFullIndex and PopulateFromFullIndexStreaming now set
wal_autocheckpoint=0 during the batch loop (eliminates WAL checkpoint
I/O interruptions between the 50 batch transactions on a 50k-file
repo) and double the page cache to 128 MB (vs startup's 64 MB) to
keep more B-tree nodes warm during unique-key checks. A single
PRAGMA wal_checkpoint(TRUNCATE) runs on defer after all batches.

FTS BulkInsert replaces 2M individual stmt.ExecContext calls with
batched 499-row multi-row INSERTs (~4k INSERT statements for a 50k-
file repo). Triggers are already dropped before the bulk; FTS5 rebuild
still runs once at the end. No change to the trigger/rebuild logic.

Also updates SARIF golden to v8.5.0.
ckb doctor now shows a 'lip' check — pass with indexed file count
when daemon is running, warn when it's not (semantic search disabled).

checkScip detects large repos (>50k source files) and shows a 'pass'
with a clear explanation: active tier is FTS+LSP+LIP, call graph
requires --scip to opt in. Replaces the misleading "not found" warn.

PopulateFTSFromSCIP: drop sort.Strings(symIDs). FTS5's 'rebuild'
produces deterministic BM25 output regardless of insert order — the
sort was allocating a 2M-element string slice on every full populate
for no benefit.
For a 50k-doc repo buildCallerIndex takes ~6.7s and allocates 489 MB.
Previously that work blocked the first getCallGraph / traceUsage call.
Now a background goroutine starts it immediately after s.index is set;
callerIndexOnce guarantees no duplicate work if FindCallers races it.

Benchmark (BenchmarkBuildCallerIndex, Apple M4 Pro):
  small_1k_docs  →  4.6 ms,  5 MB
  medium_10k_docs → 83 ms,  75 MB
  large_50k_docs  →  6.7 s, 489 MB   ← now absorbed in background

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BulkInsert required the caller to materialise the full []SymbolFTSRecord
slice before the transaction started (~400 MB for a 50k-file repo).
BulkInsertFunc calls a user-provided fn(flush) callback instead, letting
the caller feed records in chunks and never holding more than one chunk.

PopulateFTSFromSCIP now streams in 10k-record chunks. The ftsDropTriggers
/ ftsCreateTriggers DDL slices are extracted to package-level vars and
shared by both BulkInsert and BulkInsertFunc.

Benchmark (BenchmarkBulkInsertVsFunc, 500k symbols):
  BulkInsert:    6.6 s, 493 MB
  BulkInsertFunc: 6.3 s, 439 MB  (−55 MB caller alloc; further savings
                                    in practice where no full slice exists)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SemanticSearchWithLIP fired one WHERE file_path = ? query per LIP hit
(up to 20 for topK=20). Replace with a single WHERE file_path IN (…)
query via the new symbolsForFiles method.

SemanticSearchWithLIP signature changes from per-URI callback:
  func(fileURI string) []SearchResultItem
to batch callback:
  func(fileURIs []string) map[string][]SearchResultItem

The call site in symbols.go calls symbolsForFiles once for the full
URI list instead of looping.

Benchmark (BenchmarkSymbolsForFileVsBatch, Apple M4 Pro):
  5 files:  118 µs → 84 µs   (1.4×)
  10 files: 281 µs → 168 µs  (1.7×)
  20 files: 756 µs → 301 µs  (2.5×)

Also adds BenchmarkBuildCallerIndex to scale_bench_test.go so the
CallerIndex pre-warm cost is directly measurable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Go Mod Tidy
- Bump go directive 1.26.1 → 1.26.2 (fixes 4 crypto/tls + crypto/x509
  stdlib vulns: GO-2026-4866/4870/4946/4947; all fixed in go1.26.2)
- go mod tidy: promote golang.org/x/sys to direct dependency

Lint (gofmt)
- Format 20 files that gofmt disagreed with (pre-existing, not introduced
  by recent perf commits)

Lint (govet)
- cmd/ckb/impact.go:152: remove shadowed err via var+assign pattern
- internal/query/impact.go:471: drop tautological symbolInfo != nil check
  (symbolInfo is provably non-nil after the nil-return guard at line 259)

Lint (unused)
- internal/backends/scip/loader.go: remove unused convertDocuments func
  (only convertDocument singular is called; my bench test added that call)
- internal/cartographer/types.go: move ffiResponse out of types.go into
  bridge.go where it is used under //go:build cartographer; drop now-empty
  encoding/json import from types.go
- internal/query/fts.go: remove symbolsForFile (superseded by the new
  symbolsForFiles batch method; call site updated in the previous commit)

Lint (errcheck / check-type-assertions: true)
- internal/compliance/scanner.go:335: use two-value type assertion for
  sync.Pool.Get with nil fallback
- internal/query/golden_test.go:459,463,467: use _, ok form for map type
  assertions inside sort.Slice
- internal/watcher/watcher.go:276: use ok2 form for chan type assertion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

CKB review failed to generate output.

detectShotgunSurgery, getArchitecturalEvolution, and getBlastRadius now
use s.engine().GetRepoRoot() instead of requiring an explicit repo_path
parameter, consistent with every other Cartographer-backed tool.

analyzeCoupling suggests detectShotgunSurgery when high coupling is found.
@github-actions
Copy link
Copy Markdown

CKB review failed to generate output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant